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Abstract 

For  more  than  a  decade,  the  U.S.  Navy  has  been  modernizing  many  of  its  software 
intensive  National  Security  Systems  (NSS)  using  an  Open  Architecture  (OA) 
approach  that  leverages  capable  and  reliable  commercial  off-the-shelf  (COTS) 
technologies  and  modern,  agile  software  development  practices.  The  focus  of  the 
Naval  Open  Architecture  strategy  has  been  to  field  affordable  and  superior 
capabilities  more  rapidly  at  reduced  costs.  NSS  and  information  technology  (IT) 
system  upgrades  are  now  routinely  accomplished  using  COTS,  proving  that  the  U.S. 
Navy  has  achieved  measureable  success  in  this  area.  But  this  progress  has  not 
improved  the  environment  of  life  cycle  cost  savings  and  system  sustainment.  The 
Integrated  Logistics  Support  (ILS)  elements  of  most  acquisition  programs  are  not 
taking  full  advantage  of  industry  best  practices  that  are  robust  and  mature  for  life 
cycle  affordability  and  sustainment.  There  is  great  cost  savings  potential  in  this  area, 
as  the  cost  of  ownership  of  a  system  aboard  a  ship  over  its  life  cycle  for  repair  and 
maintenance  far  exceeds  the  Navy’s  initial  investment  in  design  and  production. 

This  paper  gives  an  overview  of  Maintenance  Free  Operating  Period  (MFOP)  pilot 
implementations  that  have  been  deployed  twice  aboard  Navy  ships.  It  will  describe  a 
fundamentally  new  system  sustainment  approach  and  acquisition  techniques,  which 
show  how  MFOP  is  a  viable  alternative  to  traditional  ILS  life  cycle  methods.  Finally, 
we  will  argue  that  system  designs  using  the  MFOP  approach  are  generally  superior 
in  terms  of  cost,  performance,  and  resource  management. 
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Introduction 

For  more  than  a  decade,  the  U.S.  Navy  has  been  modernizing  many  of  its  software 
intensive  National  Security  Systems  (NSS)  using  an  Open  Architecture  (OA)  approach  that 
leverages  capable  and  reliable  commercial  off-the-shelf  (COTS)  technologies  and  modern, 
agile  software  development  practices.  The  focus  of  the  Naval  Open  Architecture  strategy 
has  been  to  field  affordable  and  superior  capabilities  more  rapidly  at  reduced  costs.  NSS 
and  information  technology  (IT)  system  upgrades  are  now  routinely  accomplished  using 
COTS,  proving  that  the  U.S.  Navy  has  achieved  measureable  success  in  this  area.  But  this 
progress  has  not  improved  the  environment  of  life  cycle  cost  savings  and  system 
sustainment.  The  Integrated  Logistics  Support  (ILS)  elements  of  most  acquisition  programs 
are  not  taking  full  advantage  of  industry  best  practices  that  are  robust  and  mature  for  life 
cycle  affordability  and  sustainment.  There  is  great  cost  savings  potential  in  this  area,  as  the 
cost  of  ownership  of  a  system  aboard  a  ship  over  its  life  cycle  for  repair  and  maintenance  far 
exceeds  the  Navy’s  initial  investment  in  design  and  production. 

This  paper  gives  an  overview  of  Maintenance  Free  Operating  Period  (MFOP)  pilot 
implementations  that  have  been  deployed  twice  aboard  Navy  ships.  It  will  describe  a 
fundamentally  new  system  sustainment  approach  and  acquisition  techniques,  which  show 
how  MFOP  is  a  viable  alternative  to  traditional  ILS  life  cycle  methods.  Finally,  we  will  argue 
that  system  designs  using  the  MFOP  approach  are  generally  superior  in  terms  of  cost, 
performance,  and  resource  management. 

Why  Maintenance  Free  Operating  Periods? 

The  simple  answer  is  that  an  OA/MFOP  enabled  system  saves  money  and  provides 
the  warfighter  with  a  product  that  is  better,  cheaper,  and  faster: 

1 .  Better  because  the  MFOP  design  yields  more  operational  availability  to  the 
warfighter. 

2.  Cheaper  because  there  is  less  material,  infrastructure,  and  training  to  provide 
and  manage  through  the  elimination  of  platform/system  level,  material 
support  packages. 

3.  Faster  because  distance  support  techniques  eliminate  delays  in  supporting 
fielded  products  and  are  available  world-wide. 

The  Maintenance  Free  Operating  Period  Defined 

The  Maintenance  Free  Operating  Period  (MFOP)  is  defined  as  the  specified  period 
of  time  that  a  system  must  be  available  in  support  of  its  required  mission,  with  a  specified 
level  of  reliability,  and  with  no  open  cabinet  maintenance.  Commercially  available  methods 
and  products  support  very  high  probability  of  system  availability,  approaching  99%  or 
greater.  In  general  terms,  Reliability  (of  mission  time)  is  stated  as  follows: 

R(t)  =  e't/MTBF, 

where  t  is  the  mission  time  (required  MFOP),  and  MTBF  is  system  Mean  Time  Between 
Failure  under  stated  conditions. 

An  MFOP-enabled  system  is  inherently  reliable  with  continuous  health  monitoring 
status  to  provide  confidence  that  the  tactical  application  availability  requirement  is  highly 
likely  to  be  met.  To  achieve  this,  the  MFOP  system  has  the  following  design  enablers: 
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1 .  Fault  Tolerant  Design, 

2.  Data  Collection,  and 

3.  Remote  Connectivity. 

Fault  tolerant  COTS  based  designs  utilize  vendor-supplied  Mean  Time  Between 
Failure  (MTBF)  data  as  a  starting  point.  The  system  is  then  constructed  based  on  a  reliability 
block  diagram  that  provides  sufficient  redundancy  to  meet  the  required  level  of  reliability. 

This  accounts  for  the  MTBF  levels  of  the  included  components.  Note  that  vendor  MTBF 
data  is  usually  provided  to  users  based  upon  specific  conditions,  generally  a  benign 
laboratory  environment. 

Open  Architecture  and  the  MFOP  Evolution 

Open  Architecture  (OA)  is  a  collection  of  best  practices,  technical  and  business,  and 
when  combined  with  a  willing  corporate  culture,  can  result  in  a  highly  effective  life  cycle 
strategy  in  which  total  cost  of  ownership  is  minimized  and  capabilities  to  the  warfighter  are 
maximized. 

The  Navy  has  extended  the  work  of  the  Modular  Open  Systems  Approach  (MOSA) 
work  performed  by  the  DoD’s  Open  Systems  Joint  Task  Force  (OSJTF)  to  more 
comprehensively  achieve  the  desired  goals  of  open  architecture  as  a  part  of  the  Naval  Open 
Architecture  (NOA)  effort.  NOA  is  defined  as  the  confluence  of  business  and  technical 
practices  yielding  modular,  interoperable  systems  that  adhere  to  open  standards  with 
published  interfaces.  It  is  the  goal  of  the  Naval  Open  Architecture  effort  to  “field  common, 
interoperable  capabilities  more  rapidly  at  reduced  costs”  ( Updated  Naval  OA  Strategy, 

2008). 

The  Navy  and  Marine  Corps  are  incorporating  OA  into  selected  new  start  acquisition 
or  upgrades  to  existing  programs  such  as  Common  Afloat  Network  Enterprise  Services 
(CANES),  Submarine  Warfare  Federated  Tactical  Systems  (SWFTS),  Joint  Counter-Radio 
control  improvised  explosive  device  Electronic  Warfare  (JCREWJ,  and  others  (Fein,  2009). 

The  following  are  the  core  principals  of  the  Open  Systems  Architecture  approach 
(Guertin  &  Clements,  2010): 

1 .  Modular  designs  with  loose  coupling  and  high  cohesion  that  allow  for 
independent  acquisition  of  system  components; 

2.  Continuous  design  disclosure  and  appropriate  use  of  data  rights  allowing 
greater  visibility  into  an  unfolding  design  and  flexibility  in  acquisition  of 
alternatives; 

3.  Enterprise  investment  strategies  that  maximize  reuse  of  system  designs  and 
reduce  total  ownership  costs  (TOC); 

4.  Enhanced  transparency  of  system  design  through  open  peer  reviews; 

5.  Competition  and  collaboration  through  development  of  alternative  solutions 
and  sources; 

6.  Analysis  to  determine  which  components  will  provide  the  best  return  on 
investment  (ROI)  to  open...i.e.,  which  components  will  change  most  often 
due  to  technology  upgrades  or  parts  obsolescence  and  have  the  highest 
associated  cost  over  the  life  cycle. 
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Achievement  of  these  six  principles  requires  an  affirmative  answer  to  a  fundamental 
question:  Can  a  qualified  third  party  add,  modify,  replace,  remove,  or  provide  support  for  a 
component  of  a  system,  based  only  on  openly  published  and  available  technical  and 
functional  specifications  of  the  component  of  that  system? 

OA  is  ultimately  about  enabling  acquisition  choice.  When  program  managers  can 
compete  for  products  and  services  across  a  system  design,  they  can  establish  an 
environment  of  continuous  competition  for  the  best  possible  solution  at  the  best  possible 
price. 

MFOP  Evolution 

Since  2005,  two  MFOP  pilots  have  been  conducted  on  Navy  ships: 

■  Submarine  MFOP  Pilot  Program.  The  AN/BQQ-10  (a.k.a.,  Acoustic  Rapid 
COTS  Insertion,  or  ARCI)  submarine  tactical  sonar  system  is  the  premier 
example  program  for  an  Open  Architecture  (OA)  in  the  Navy.  This  program 
pioneered  OA  in  the  Navy/Marine  Corps.  In  2005,  four  submarines  were 
augmented  with  additional  embedded  servers  and  additional  design  elements 
to  ensure  a  90-day  MFOP  period  for  tactical  software  availability  within  the 
MFOP  boundary.  The  rest  of  the  system  was  managed  using  the  traditional 
ILS  support  system.  Five  years  later,  the  tools  and  techniques  now  able  to 
tackle  the  full  range  of  technical  challenges  that  confronted  the  earlier 
attempts  have  been  greatly  improved  by  the  commercial  market  computing 
industry. 

■  Surface  Ship  MFOP  Demonstration.  This  was  conducted  in  2010  as  a 
comprehensive  OA/MFOP  demonstration  aboard  USS  Iwo  Jima  (LHD  7). 

The  demonstration  exercised  the  Navy’s  evolving  concepts  for  risk  reduction 
and  cost  savings,  as  well  as  exploring  the  full  extent  of  the  MFOP  concept. 
This  demonstration  relied  on  reuse  of  two  different  operational  software 
assets,  one  from  the  Navy’s  Software  Hardware  Asset  Reuse  Enterprise 
(SHARE)  repository,  and  the  other  through  program/domain  awareness. 
These  Navy-funded  designs  were  combined  with  commercially  available 
management  capabilities  and  re-hosted  on  a  highly  reliable  commercial  blade 
center  with  embedded  spares  that  was  designed  for  the  entire  system 
boundary.  In  this  demonstration,  the  system  MFOP  period  was  doubled  to 
180  days,  and  the  certified  support  package  provided  in  the  temporary 
installation  (TEMPALT)  had  zero  maintenance  support  items  provided  to  the 
ship. 

Case  Study:  The  Surface  Ship  OA/MFOP  Demonstration 
Requirements  and  Approach 

The  object  of  the  Surface  Ship  OA/MFOP  Proof  of  Concept  demonstration  was  to 
develop  a  scalable  and  extensible  demonstration  system  that  would  provide  a  greater  than 
99%  probability  for  a  tactical  capability  under  test.  Success  would  be  measured  by 
completing  a  deployment  on  a  combat  ship  of  180  days  with  no  open  cabinet  maintenance, 
while  eliminating  the  traditional  shipboard  maintenance  support  package.  All  design 
decisions  associated  with  the  implementation  methods  were  targeted  for  an  NSS  of  scale 
and  complexity,  so  that  these  lessons  and  designs  could  be  used  for  large-scale  programs 
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such  as  PEO  C4l’s  CANES,  PEO  SUB’s  SWFTS,  PEO  LMW’s  Littoral  Combat  Ship  Mission 
Module  program,  and  PEO  IWS’s  AEGIS,  among  others. 

For  control  purposes,  the  system  required  an  operational  capability  from  which  to 
measure  system  availability  and  design  for  reliability.  The  Common  Network  Interface  (CNI) 
software  application,  originally  contracted  by  PEO  IWS  6  for  Amphibious  Assault  Ships  and 
developed  by  GD-AIS,  was  selected.  The  specific  version  of  CNI  used  in  the  demonstration 
was  selected  due  to  its  availability  in  SHARE  repository  and  the  willingness  of  the  originating 
program  office  to  support  the  demonstration.  A  suitable  hardware  platform,  that  is,  one  that 
would  be  typical  of,  and  extensible  to,  a  shipboard  tactical  information  system,  was  then 
configured  to  ensure  CNI  would  be  operationally  available  for  the  stated  mission  time. 

OA/MFOP  Demonstration  System  Design 

Three  particular  design  features  were  used  in  the  surface  ship  demonstration  system 
(see  Figure  1): 

■  Fault  Tolerance.  The  hardware  platform  was  made  fault  tolerant  by  adding 
and  embedding  redundancy  based  on  the  hardware  vendor’s  supplied 
component  MTBF  data,  and  adding  a  method  for  controlling  spare  resources 
(failover). 

■  Data  Capture  and  Collection.  All  components,  including  power  and  cooling 
devices,  were  monitored,  either  through  built-in  Simple  Network  Management 
Protocol  (SNMP)  message  traps,  or  more  sophisticated  software  agents 
running  in  data  servers.  This  data  was  continuously  collected  for  online 
assessment  and  post  mission  analyses. 

■  Remote  Connectivity.  The  system  was  connected  to  SIPRNET.  The 
purpose  of  this  link  was  to  collect  reliability  performance  information  for  online 
assessment,  and  to  allow  subject-matter  experts  (SMEs)  ashore  to  restore 
system  operation  in  the  event  of  a  software  failure. 
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Figure  1.  OA/MFOP  Enabled  System  Design  Elements 
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The  following  paragraphs  detail  the  considerations  that  went  into  the  design  and 
selection  of  products  for  the  OA/MFOP  system. 

Fault  Tolerance 

The  OA/MFOP  enabled  system  tolerates  faults  by  embedding  (online)  spare 
resources  and  employing  mechanisms  to  control  them.  In  the  event  of  a  component  failure, 
the  system  detects  the  problem  and  reconfigures  around  it.  The  following  paragraphs  are 
specific  to  how  this  was  done  in  the  design  of  the  Surface  Ship  OA/MFOP  Demonstration 
system. 

Embedded  Spares 

The  OA/MFOP  proof  of  concept  demonstration  system  was  configured  to  ensure  the 
CNI  operational  capability  would  be  available  for  the  entire  ship’s  deployment  period  of  180 
days.  This  assumed  the  CNI  function  was  needed  continuously,  and  that  the  calculated 
probability  of  mission  success  was  greater  than  99%.  Requirements  were  analyzed  and 
allocated  to  a  potential  solution,  from  which  a  clear  winner  emerged.  A  Blade  Center 
platform  was  chosen  because  of  the  inherent  redundancy  built  into  the  product  design.  That 
is,  the  number  of  power,  cooling,  network  communications,  processors,  and  I/O  elements 
were  scalable  to  meet  the  reliability  demands  of  the  operating  period. 

The  specific  device  chosen  was  an  IBM  Blade  Center  “T-Chassis®”  as  it  provided 
comprehensive  measures  for  component  monitoring  (advanced  management  modules),  as 
well  as  extended  environmental  survivability,  that  is,  TELCO  hardening  Standards  NEBS- 
3/ETSA.1  To  further  improve  MTBF,  the  application  server  magnetic  hard  drives  were 
relocated  to  the  IBM  DS3400,  a  highly  redundant  storage  area  network  (SAN)  with  RAID 
level  6  applied. 

When  Reliability  Block  Diagrams  were  built  to  the  OA/MFOP  demo  system 
configuration  and  analyzed  (using  RELIASOFT  Inc.,  Block-Simulator  7),  the  built  in 
redundancy  of  the  system  provided  a  greater  than  99%  probability  of  mission  success.  This 
result  was  expected,  but  what  surprised  the  development  team  was  that  the  one-year  and 
four-year  probabilities  for  R(t)  were  so  high  (see  Figure  2). 
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Figure  2.  R(t)  Probability  of  Mission  Success 


1  NEBS  Level  3  Includes  Specifications  GR1089-Core  and  GR63. 
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This  was  an  exciting  prospect,  as  most  Navy  COTS  technology  Refresh  Cycles 
occur  in  four-year  increments.  Is  it  possible  that  all  spares  could  be  installed  into  a  system 
from  the  beginning? 

Dealing  With  Vendor  Supplied  MTBF  Numbers 

The  MTBF  data  provided  by  the  vendor  is  not  detailed  enough  to  perform  a  precision 
analysis  of  failure.  We  transferred  vendor  MTBF  numbers  to  a  constant  failure  rate 
(exponential  distribution),  where  at  any  time  the  likelihood  of  failure  was  the  same.  In 
reality,  the  probability  for  component  failure  is  higher  when  a  component  is  new,  and 
declines  to  a  low  probability  for  the  bulk  of  the  hardware  lifespan.  The  probability  of 
component  failure  during  this  period  is  low  and  relatively  stable,  but  failures  do  occur. 

Faults  occur  on  a  pseudo-random  distribution,  often  referred  to  as  the  “bath-tub  curve”  (see 
Figure  3). 

It  should  also  be  noted  that  the  slope  and  period  of  these  curves  depend  on  other 
environmental  factors,  and  are  perturbed  by  temperature,  humidity,  vibration,  and  dust.  The 
OA/MFOP  demonstration  system  did  not  attempt  to  deal  with  these  effects  or  de-rate  the 
MTBF  results  to  account  for  a  shipboard  environment.  We  dealt  with  this  uncertainty 
through  environmental  monitoring  and  comparing  empirical  failure  reports  to  the  vendor 
MTBF  data  over  the  course  of  the  system’s  in  service  life. 
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Figure  3.  Computer  Hardware  Failure  Rate  Profile 

Additionally,  minimum  thresholds  for  probability  of  mission  success  in  the  face  of 
hardware  failures  can  be  established  to  initiate  service  technician  support  for  the  installed 
system.  Figure  4  depicts  cumulative  failure  density  over  time.  The  system  design 
accounted  for  a  number  of  failures  to  occur  over  the  life  cycle.  As  long  as  the  failure  rate 
falls  below  the  “acceptability  line,”  there  should  be  sufficient  hardware  reliability  remaining  in 
the  system  to  complete  the  stated  mission  time.  This  mission  time  could  be  stated  as  a 
deployment  period  (6  months),  or  a  tech  refresh  cycle  (4  years). 
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Figure  4.  Repair  Action  Decision  Criteria 


Failover 

Hardware  redundancy  is  not  enough.  In  maximizing  full  Operational  Availability,  we 
need  to  examine  “Uptime”  in  relation  to  Total  Mission  Time.  Uptime  is  not  just  the  longevity 
of  a  specific  piece  of  hardware,  but  the  availability  of  the  warfighting  capability. 

A  method  of  automatically  detecting  faults  and  automatically  responding  to  them  was 
established.  Processing  capacity  is  redirected  to  available  embedded  spares  (without 
operator  intervention)  in  the  presence  of  component  failure.  This  implied  that  regular 
polling  and  tracking  of  system  state  information  must  be  provided  to  a  control  mechanism 
that  acted  to  restore  operation  according  to  a  predefined  plan.  Automatically  detecting  faults 
has  been  a  major  focus  of  system  management  function  effort  for  NSS  projects  in  the  past. 
Due  to  the  development  of  robust  data  center  management  software  capabilities  in  the 
commercial  market  to  support  innovations  such  as  cloud  computing,  failover  and  fault 
recovery  capability  can  be  acquired,  vice  hand  tooled.  The  OA/MFOP  Demonstration 
development  team  evaluated  software  solutions  that  are  commercially  available  to  perform 
the  basic  functionality  needed  to  sustain  applications  to  the  warfighter,  maintenance  free. 
Based  on  a  market  survey  of  product  capabilities,  the  IBM  Director  Management  Software 
product  (Version  5.20)  was  chosen.  This  product  met  the  requirements  for  monitoring  and 
failover,  but  it  also  contained  a  unique  feature  called  “open  fabric  manager”  that  managed  all 
worldwide  names  (WWNs)  and  logical  unit  numbers  (LUNs)  for  the  included  application 
servers,  and  could  automatically  reconnect  the  application  storage  volume  on  the  SAN  to  a 
spare  processor  and  resume  processing.  This  greatly  simplified  a  traditionally  hard  problem 
of  reconfiguring  around  failures.  With  this  method,  the  applications  reside  in  the  same 
address  without  any  overt  additional  effort. 

Embedded  spares  and  failover  management  software  are  the  design  features  that 
combine  to  represent  the  fault  tolerant  attributes  of  the  demonstration  system. 

Data  Capture  and  Collection 

In  the  context  of  OA/MFOP,  ongoing  performance  monitoring  provides  the  feedback 
loop  from  which  all  management  responses  are  applied.  At  the  component  level,  messages 
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are  transmitted  via  Simple  SNMP  messages,  which  are  trapped  and  processed  by  the 
system  software  to  assist  in  failure  response.  At  a  higher  level,  this  and  other  data  is 
collected  over  time  to  analyze  performance  trends  for  the  purposes  of  making  proactive 
program  support  decisions.  The  OA/MFOP  demonstration  system  employed  a  layered 
approach  to  data  capture  that  included  time  series  monitoring  of  all  critical  performance  and 
environmental  parameters.  This  layering  was  a  critical  design  requirement  in  order  to  ensure 
scalability  to  multiple  warfighting  platforms  and  domains.  The  designers  were  especially 
concerned  with  the  disadvantaged  network  user  and  the  aperiodic  communicator.  MFOP 
performance  can  be  achieved  with  small,  but  highly  targeted  system  status  reports  to  the 
shore-side  maintainers.  The  crucial  information  made  available  at  the  appropriate  time 
allows  decision  makers  to  perform  prognostic  maintenance  decisions.  Given  that  a  failure 
has  occurred,  and  automatic  reconfiguration  has  been  executed  according  to  the  pre¬ 
scripted  recovery  plan  embedded  in  the  system,  a  report  is  generated.  The  distance  support 
specialist  can  then  examine  the  know  state  of  the  system,  the  remaining  hardware 
availability,  and  the  likelihood  of  future  component  failures  (based  on  life  and  environmental 
conditions),  and  make  a  decision  when  action  is  required.  Three  decisions  are  possible:  (1) 
Near-term  corrective  action  is  necessary  to  sustain  operational  availability  of  the  capability 
during  the  deployment  period,  with  flyaway  support  personnel;  (2)  No  action  is  required  and 
corrective  action  can  wait  until  after  the  deployment  is  complete;  and  finally,  (3)  No  action  is 
required  until  the  next  full  Technology  Insertion  event.  The  key  difference  with  an  OA/MFOP 
enabled  system,  is  that  these  decisions  can  be  made  throughout  the  lifespan  of  the  system, 
and  the  decision  criteria  are  fully  available  throughout  the  operational  command  and  support 
infrastructure. 

The  Specific  OA/MFOP  Demonstration  System  Monitoring  Scheme 

Hardware  Monitoring.  All  replaceable  component  devices  in  the  OA/MFOP  system 
were  monitored.  All  components  within  the  Blade  Center  hardware  boundary  were 
monitored  by  the  two  (redundant)  Advanced  Management  Modules  (AMMs).  Those  external 
to  the  blade  center  were  attached  to  the  Ethernet  network,  and  their  state  data  collected 
through  SNMP  and  Storage  Management  Initiative-Specification  (SMI-S)  message  traps. 
These  data  were  then  interfaced  with  the  IBM  Director  Management  Software  for  monitoring 
and  event  action  response.  Finally,  the  captured  data  were  stored  in  an  Oracle  database 
that  could  be  queried  by  subject-matter  experts,  as  well  as  life  cycle  support  planners, 
project  managers,  and  operational  commanders.  This  data  would  support  those  in  off  board 
analyses  leading  to  proactive  decision-making. 

Environmental  Monitoring.  Knowing  the  physical  environment  is  a  key  to 
determining  cause  and  effect  properties  of  the  deployed  hardware.  Most  hardware  failures 
that  occur  outside  the  machine’s  expected  longevity  envelope  are  caused  by  extreme 
temperature,  humidity,  dust,  power  surges,  and  vibration.  The  OA/MFOP  demonstration 
system  included  an  NTI  Inc.  Enviromux  16™  processor  to  collect  and  transmit  this  data  to 
the  management  server.  The  data  were  time  tagged  for  correlation  and  trending  purposes  in 
support  of  off-board  analyses. 

Application  Server  Monitoring.  There  are  several  software  agents  in  the  market 
that  provide  various  levels  and  degrees  of  application  server  monitoring.  Generally,  they  all 
log  application  uptime,  and  provide  some  level  of  basic  resource  monitoring,  such  as  CPU 
load  percentage,  Memory  percentage,  I/O  throughput  levels,  and  storage  system  utilization. 
The  OA/MFOP  system  selected  and  used  the  IBM  Director  Management  Software  “Level  II 
Managed  Agent®”  product  for  all  application  servers  in  the  system. 
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Remote  Connectivity 

In  order  to  ensure  the  deployed  OA/MFOP  system  was  supported  while  deployed, 
the  system  was  connected  to  SIPRNET  where  it  sent  summary  and  event  reports  back  to 
the  Off  Hull  terminal,  and  if  necessary,  operationally  restored  using  remote  system  login  and 
administration  capabilities. 

Reporting 

The  OA/MFOP  system  re-used  the  Remote  Off  Hull  Maintenance  Support  (ROHMS) 
software  developed  by  NAVSEA  PMS  401  contract  for  use  in  the  AN/BQQ-10  sonar  system 
to  transmit  status  and  other  maintenance  related  reports  to  a  connected  shore  side  terminal. 
The  ROHMS  application  is  constructed  on  an  open  source  software  platform,  including  the 
TOMCAT™  web  server  and  the  Firefox™  web  browser  provided  by  the  Mozilla™ 

Foundation.  The  ROHMS  feature  specifically  used  in  the  OA/MFOP  demonstration  was  the 
file  transfer  functionality.  It  provided  concise  reports,  most  of  which  used  very  low  network 
bandwidth,  about  the  size  of  a  typical  e-mail  record  (2-20  KB).  Reports  were  based  on 
queries  of  specific  data  elements  held  in  the  OA/MFOP  deployed  system’s  database.  This 
was  not  a  replication  server,  as  limiting  network  communication  bandwidth  was  a  priority. 
Under  normal  conditions,  brief  reports  were  sufficient.  The  OA/MFOP  demonstration 
employed  the  following  reports: 

■  Summary  Status  Report:  Provided  daily,  it  listed  the  status  of  all  hardware, 
environmental  levels,  Application  availability,  and  resource  utilization. 

■  Event  Report:  On  the  occasion  that  a  system  event  or  hardware  failure 
occurred,  the  ROHMS  connector  on  the  ship  would  transmit  an  Event  Report, 
listing  cause,  effect,  and  restorative  action. 

■  Detailed  Report:  A  third  type  of  report  was  also  employed  that  provided  event 
detail  to  be  used  by  SMEs  to  determine  if  follow  up  action  or  planning  was 
necessary. 

Control 

Distance  support  is  an  alternative  maintenance  concept  that  connects  SMEs  to  the 
ship  system  over  a  network  (in  this  case  SIPRNET)  to  assist  ship’s  force  in  restoring  the 
tactical  operation  of  the  system.  There  are  several  techniques  that  can  be  used  to  assist  in 
this  manner.  The  two  most  popular  are  the  following: 

■  Remote  Collaboration:  useful  for  bridging  Operational  to  Intermediate  Level 
maintenance;  and 

■  Remote  System  Administration:  used  to  login  to  a  system  for  the  purpose  of 
restoring  software  operation. 

The  OA/MFOP  system  employed  two  Remote  System  Administration  techniques 
over  SIPRNET: 

1 .  Web  Browser:  A  menu  driven  login  using  HTTPS  with  Secure  Socket  Layer 
(SSL)  encryption.  It  was  used  in  OA/MFOP,  because  the  system  was 
deployed  as  autonomous,  with  no  ship’s  force  assistance.  This  method  is 
very  network  bandwidth  efficient,  but  in  most  instances,  the  utility  provided 
does  not  necessarily  require  the  services  of  an  off  board  SME. 

2.  Virtual  Network  Connection  (VNC):  A  technique  that  allows  the  remote  SME 
to  login  to  a  specific  server/processor  at  the  System  Administrator  level.  VNC 
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uses  frame  buffer  relay  techniques  to  provide  the  SME  with  a  remote 
interface  to  the  target  machine.  From  there,  the  system  can  be  analyzed, 
restored,  and  updated.  The  OA/MFOP  system  used  the  Real  VNC  ®  product 
to  positively  control  the  system  during  deployment.  All  distance  support 
objectives  were  accomplished  without  any  collaboration  of  ship’s  force. 

OA/MFOP  Demonstration  System  Deployment 

TEMP  ALT  Planning  and  Approval 

A  Ship  Change  Document  (SCD)  was  prepared  for  installation  aboard  USS  Iwo  Jima. 
The  Ship  Main  process  required  that  the  installation  package  include  drawings,  a  risk 
assessment,  and  a  certified  Integrated  Logistics  support  package.  These  were  scrutinized 
and  approved  through  COMNAVSURFLANT.  Since  the  OA/MFOP  system  did  not  require 
open  cabinet  maintenance  throughout  the  deployment  period,  the  certifying  authority  waived 
the  following  ILS  products: 

■  Maintenance  &  Repair  Documentation, 

■  3M  System  Package, 

■  On  Board  Repair  Parts, 

■  Maintenance  Assist  Modules, 

■  System  Drawings, 

■  APL/  Supply  Support  Documentation,  and 

■  Crew  Training.  (The  crew  was  briefed  and  given  the  procedure  for  an 
emergency  shutdown  only.) 

Information  Assurance  Challenges 

In  order  to  demonstrate  Remote  Connectivity  capabilities,  the  OA/MFOP  system  was 
required  to  undergo  Information  Assurance  (IA)  certification  by  NAVNETWARCOM.  An 
Interim  Authority  To  Test  (IATT)  was  sought  for  a  six-month  test  period.  Leading  up  to  the 
OA/MFOP  demonstration  test  date,  there  were  no  known  Navy  ship  systems  that  had  been 
granted  approval  to  use  remote  connectivity  for  maintenance  of  tactical  systems  over 
SIPRNET.  It  is  noteworthy  that  the  ROHMS  capability  had  been  granted  a  one-day  test  on 
SIPRNET,  but  had  not  been  approved  for  use  on  a  deployed  submarine.  Although  the  data 
being  collected  over  ROHMS  is  UNCLASSIFIED,  the  system  application  (CNI)  was 
designed  to  interface  to  classified  sensors  (Link  16)  and  to  “Text  Chat”  among  various  units 
of  the  strike  group,  rendering  the  entire  system  “SECRET.” 

Developers  beware:  The  concept  of  operations  (CONOPS)  and  bandwidth  used  on 
Navy  networks  is  of  particular  importance  to  those  who  validate  and  approve  Defense 
Information  Assurance  Information  Assurance  Certification  And  Accreditation  (DIACAP) 
application  packages.  Generally,  a  candidate  system  will  be  required  to  demonstrate 
network  communications  behavior  with  all  vulnerability  patches  applied.  Depending  on  the 
scope  and  intensity  of  network  interaction,  as  well  as  mission  assurance  category  (MAC) 
level,  a  number  of  interoperability  tests,  conducted  on  a  simulated  tactical  network  will  likely 
be  necessary  to  gain  approval  of  the  DIACAP  document.  This  certification  is  then  used  to 
request  NAVNETWARCOM  approval  for  the  desired  level  of  network  connectivity,  that  is, 
Authority  to  Operate.  Collaboration  with  the  Echelon  II  IA  representative  should  begin  at 
least  one  year  in  advance  of  the  accreditation  need  date. 
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The  OA/MFOP  demonstration  project  reused  ROHMS  and  CNI  from  prior  programs 
that  had  already  undergone  Navy  network  testing.  There  were  sufficient  elements  of 
similarity  among  the  systems  and  their  interfaces  to  the  network  that  OA/MFOP  met  the 
demonstration  requirement  “by  analysis.” 

Surface  Ship  OA/MFOP  Demonstration  Results 

The  demonstration  completed  in  January  201 1 .  The  TEMPALT  system  was  then 
removed  over  the  last  week  of  February  201 1 .  Statistical  performance  details  will  be 
published  in  a  report  in  late  summer  201 1 .  A  quick-look  report  includes  the  following 
highlights: 

■  The  measured  operational  availability  of  the  CNI  operational  software  was 
99.67%  over  the  deployment  period.  The  remaining  unreliability  level 
(0.33%)  was  due  to  the  two  (test  team)  induced  failures  used  to  measure  the 
automatic  failover  response  of  the  system.  The  operational  availability  of  the 
ROHMS  application  server  was  measured  at  100%,  as  ROHMS  was  not 
intentionally  failed  while  deployed. 

■  The  physical  environment  was  relatively  benign.  Temperatures  hovered 
around  25°  C,  while  humidity  and  power  were  stable  and  generally  reflective 
of  laboratory  conditions. 

■  There  were  no  actual  hardware  failures  over  the  course  of  the  MFOP 
deployment  period.  In  fact,  the  system  has  virtually  been  in  continuous 
operation  for  two  years  with  no  physical  failures  noted.  This  speaks  to  the 
inherent  reliability  of  today’s  Enterprise  IT  systems. 

■  Six  Distance  Support  objectives  were  successfully  demonstrated.  These 
were  designed  to  eliminate  the  need  for  shipboard  ILS  products,  as  well  as 
Fleet  Technical  Assistance  “Fly-Away”  time  and  cost.  These  Included  the 
following: 

o  Monitoring  All  Hardware  Status; 
o  Monitoring  Server  Operations/  Resources; 
o  Collecting  System  Availability  and  Environmental  Data; 
o  Remotely  Inducing  Simulated  Failures/Observed  Automatic  Failover 
and  Recovery  Using  Embedded  Spares;  and 
o  Performing  Remote  IT,  including  Restarts,  Pushing  Files,  Adding 
Applications,  and  Correcting  Code  Errors. 

OA/MFOP  in  the  Context  of  Total  Ownership  Cost 

Operation  and  support  costs  can  make  up  70%  of  the  total  ownership  cost  of  the 
system.  A  significant  portion  of  these  costs  are  attributable  to  spares,  maintenance  training, 
and  their  associated  infrastructure.  OA/MFOP  targets  these  specific  cost  contributors  for 
elimination. 
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Figure  5.  Impact  of  MFOP  Design  in  Overall  Program  Costs 
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Figure  6.  Impact  of  MFOP  in  Technology  Insertion  Life  Cycle  Strategy 

ILS  development  tasks  are  redirected  to  Life  Cycle  Engineering  purposes  (Failure 
Modes  Effects  and  Criticality  Analysis,  and  the  like)  which  feed  back  to  System  Engineering 
for  evolutionary  improvement.  Thus,  the  modernization  schedule  becomes  the  life  cycle 
support  strategy. 
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Figure  7.  Cost  Elements  Targeted  for  Elimination  by  MFOP  Design 
Bounding  the  MFOP  Environment 

The  OA/MFOP  boundary  determines  the  level  of  savings.  The  goal  should  be  to 
include  the  entire  system  within  the  OA/MFOP  boundary.  Figure  8  shows  the  MFOP 
boundaries  of  the  submarine  sonar  pilot  (2005)  through  the  surface  ship  demonstration 
(2010).  Based  on  the  market  research  and  implementation  of  COTS  technologies  in  the 
surface  ship  design,  it  is  suggested  that  a  majority  of  the  Navy’s  tactical  Information  systems 
can  implement  the  OA/MFOP  design  model  across  the  entire  system.  The  benefit  is 
obvious;  complete  elimination  of  the  traditional  ILS  support  package,  and  the  corresponding 
reduction  in  infrastructure. 
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Figure  8.  MFOP  Boundaries  Determine  Level  of  Savings 

Phased  Implementation  in  a  Strategic  Stepwise  Manner 

Designing  to  an  MFOP  solution  for  sustaining  capability  in  the  field  can  be 
accomplished  with  low  risk  when  starting  with  a  new  system  design.  However,  many 
programs  in  the  Navy  today  are  doing  product  improvements  to  existing  systems.  For  this 


ACQUISITION  RESEARCH:  CREATING  SYNERGY  FOR  INFORMED  CHANGE  -  471 


case,  MFOP  capability  can  be  achieved  in  a  stepwise  manner.  We  prescribe  a  set  of  steps 
to  get  the  most  value  in  the  shortest  time  while  ultimately  driving  to  reduce  shipboard 
maintenance  to  the  point  of  elimination. 

The  first  step  is  to  capture  the  value  of  distance  support  from  ship  to  shore  through  a 
network  connection  that  bridges  between  the  organic  system  maintainers  (O)  to 
intermediate  subject-matter  experts  and  tech  assist  (I)  levels.  This  O-to-l  Level  Maintenance 
Bridge  requires  little  product  integration  and  will  immediately  generate  cost  savings.  Table  1 
highlights  an  example  program  that  achieved  a  15:1  cost  savings  ratio  when  employing 
distance  support  services  over  deploying  tech  assists. 

Table  1.  Cost  Data  for  Fleet  Technology  Assistance 


Fleet  Tech  Assist  Data  For  Submarine  Enterprise 

■  120  FTA  Events  Performed 

□  93  Local  (Norfolk) 

□  27  Out-Of-Area 

■  100%  Distance  Support  (DS)  Attempts  (CFFC  /  Command  Policy) 

□  16%  Success  Rate  Overall  On  All  FTA  Events 

□  37%  Success  Rate  On  Out-Of-Area  Events 

■  Average  MHs  Per  Event 

□  19  MH  Via  DS 

□  164  MH  Via  On-Site  Support 

■  Average  Cost  Per  Event  (Based  On  S60.00  Per  Hour) 

□  $1,140.00  For  DS 

□  $9,840  00  Labor  and  $5, 550.00  Travel  For  On-Site  ($15,390.00) 

15:1  Cost  Savings  When  DS  is  Successful 

These  methods  generated  faster  response  time  for  solving  the  system  problem,  as 
well  as  lowering  labor  and  travel  costs.  A  secondary  effect  of  preferentially  using  distance 
support  vice  on-site  fleet  tech  assists  is  that  more  fleet  problems  per  unit  time  can  be  solved 
by  a  single  subject-matter  expert. 

The  next  step  in  this  strategic  path  is  to  establish  data  collection  in  the  system.  The 
collected  information  can  be  used  by  the  distance  support  elements  to  rapidly  focus  on 
problem  areas  and  solve  issues  quickly.  This  will  also  support  system  health  and  status 
reporting  to  a  variety  of  stakeholders,  including  operational  commanders,  so  that  they  have 
up-to-date  awareness  on  the  ability  for  their  platforms  to  support  assigned  missions. 
Instrumentation  of  system  components  can  be  quickly  achieved  through  built  in  test  (BiTe) 
and  component  information  that  is  inherently  available  in  commercial  computer  systems 
through  such  mechanisms  as  SNMP.  There  is  a  rich  variety  of  SNMP  collection  agents  on 
the  market,  including  open  source  software,  that  provide  facilities  to  capture  data  already 
available  in  any  network  system.  Products  such  as  ROHMS,  the  data  collection,  reduction, 
and  dissemination  utilities  developed  under  the  OA/MFOP  program,  have  been  designed  to 
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capture  this  data  and  provide  reporting  of  system  health  and  status  information  that 
specifically  address  low  network  bandwidth  requirements. 

Fault  tolerant  system  design  through  built  in  spares  and  automated  failover  is  the 
next  of  the  strategic  steps.  This  step  requires  a  change  in  hardware  baseline  for  the  added 
resources  to  support  failover  and  is  the  tipping  point  to  facilitate  the  MFOP  concept  for  a  full 
deployment  period.  Several  programs  in  the  Navy  have  achieved  some  level  of  embedded 
redundancy  and  automated  failover,  but  in  the  context  of  eliminating  single  points  of  failure, 
which  traditionally  would  be  immediately  corrected  by  the  O-level  maintainer.  MFOP  designs 
include  the  elimination  of  single  points  of  failure,  but  add  the  dimension  of  measuring  the 
rest  of  the  system  and  determining  when  in  the  future  repairs  need  to  take  place  in  order  to 
sustain  a  required  probability  of  mission  success.  This  is  done  through  the  development  of 
reliability  block  diagrams  and  creating  automated  fault  recovery  routines  and  heuristics  to 
sustain  tactical  function  in  the  face  of  component  failures.  Prognostic  maintenance 
decisions,  vice  reactive  maintenance  action  represent  the  biggest  shift  in  culture  for  the 
current  fleet  support  environment. 

The  final  step  of  reworking  the  life  cycle  planning  can  be  quickly  achieved  through 
programmatic  restructuring  once  the  previous  three  technical  steps  are  performed.  When 
the  facilities  for  distance  support,  data  collection  and  dissemination,  and  fault  tolerant  MFOP 
designs  are  put  in  place,  the  next  logical  step  is  to  retool  the  infrastructure  to  take  advantage 
of  the  life  cycle.  This  is  where  the  fleet  maintenance  support  infrastructure  can  be  retooled 
to  take  full  advantage  of  distance  support  and  maximum  elimination  of  open  cabinet 
maintenance.  This  is  also  where  Technology  Insertion  strategies  can  be  revised  to  take  full 
advantage  of  the  MFOP  concept  to  establish  new  life  cycle  strategies,  as  previously 
described. 

How  Does  The  Navy  Drive  Change? 

To  effectively  eliminate  support  infrastructure,  Program  Sponsors  must  hand  down 
strong  top-level  requirements  (TLRs)  for  total  ownership  cost  reductions  to  Program 
Managers  for  execution.  This  can  be  a  significant  challenge  for  a  couple  of  reasons: 

1 .  Modernization  budgets  rarely  support  the  full  range  of  proposed 
improvements,  and  capability  enhancements  are  generally  prioritized  above 
those  aimed  at  creating  efficiencies  in  operating  costs;  and 

2.  The  budget  lines  for  O&MN  infrastructure  elements  are  carved  out  before  the 
Program  Sponsor  level.  These  costs  are  distributed  to  training  commands 
and  supply  chain  management,  and  thus  the  acquisition  offices  have  no 
insight  into  the  potential  cost  savings  possible  with  an  OA/MFOP  solution. 

Only  with  full  cost  auditing  at  the  highest  levels  of  Program  budget  distribution  can  a 
complete  cost  profile  be  quantified. 

In  practice,  it  is  common  for  TLRs  to  be  collaborated  on  ahead  of  time  by  the  Program 
Sponsors  and  Acquisition  Managers  (B.  Johnson2,  personal  communication,  March  2011). 
(Strategies  used  by  PMS  425  and  OPNAV  N87  to  specify  COTS  requirements  and  methods  for 
ARCI  acquisition  leading  to  Open  Architecture  implementation.)  A  hard  operational 
requirement  would  certainly  be  the  purview  of  the  OPNAV  Sponsor,  with  its  technical 
implementation  requirements  left  to  the  acquisition  community.  For  example,  if  the  Sponsor 


2  Bill  Johnson  is  the  inaugural  program  manager  for  A-RCI. 
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wants  to  reduce  total  ownership  costs,  the  acquisition  manager  may  offer  OA/MFOP  as  a 
method  of  eliminating  at  sea  maintenance  cost  and  lowering  support  infrastructure.  If  agreed, 
a  suitable  requirement  is  then  codified.  This  requirement  may  be  transcribed  as  an 
improvement  in  Operational  Availability,  whereby  the  system  must  be  restored  within  five 
minutes  upon  the  detection  and  verification  of  a  hardware  failure.  In  practice,  this  requirement 
could  only  be  met  in  a  system  designed  to  be  fault  tolerant.  Similar  requirements  for 
maintenance  data  collection  and  distance  support  (over  Navy  networks)  functionality  could  be 
specified  in  the  solicitation  (Request  For  Proposal)  with  incentives  weighted  toward  full 
OA/MFOP  proposals. 

Commercial  Trends 

There  are  two  areas  where  commercial  IT  needs  are  driving  the  development  of  high 
availability  solutions:  datacenter  management  software  and  redundancy/auto¬ 
recovery/failover  solutions.  Industry  investment  in  cloud  computing  related  technologies  are 
racing  ahead  to  support  high  availability  solutions  such  as  software  as-a-service  and  virtual 
offices.  Companies  like  IBM  offer  technologies  and  services  under  the  monikers  Resiliency 
Services,  which  address  availability,  and  Recovery  Services,  which  address  failover.  Both 
have  the  same  purpose  as,  we  require  for  an  MFOP  environment  to  protect  the  availability 
of  their  client’s  IT.  The  former  is  geared  towards  continuous  24X7  of  the  target  system,  while 
the  latter  maximizes  the  integrity  of  the  data,  with  some  flexibility  in  restoration  time.  The 
technology  innovation  itself  is  driven  by  large  enterprise  business  needs  for  continuous  data 
services  that  are  secure.  The  business  sectors  driving  these  product  development  areas 
include  the  following: 

1.  Banking/Financial  Services, 

2.  Distribution  Centers, 

3.  Public  Administration,  and 

4.  Industrial. 

Summary/Conclusion 

The  Naval  Enterprise  has  made  significant  strides  with  Open  Architecture  and  COTS 
technologies.  Significant  budget  pressure,  coupled  with  fleet  operational  demands,  make  it 
clear  that  we  must  reduce  costs  and  increase  availability  using  the  resources  we  have  and 
by  combining  them  in  new,  smarter  delivery  packages.  The  techniques  described  in  this 
paper,  instantiated  on  USS  Iwo  Jima,  graphically  demonstrate  the  power  and  savings 
potential  of  the  Maintenance  Free  Operating  Period  concept.  MFOP  will  dramatically  cut 
costs  in  training,  repair,  and  sustainment  logistics,  while  pushing  availability  to  new  levels  of 
excellence.  The  only  thing  that  stands  in  the  way  of  an  MFOP  future  where  we  purposefully 
reduce  shipboard  maintenance  to  the  absolute  minimum,  thus  allowing  our  warfighters  to 
concentrate  on  fighting,  is  the  will  to  require  this  in  our  systems,  and  grow  it  across  the 
Naval  Enterprise. 
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